# Synthetic data generalization

Sapiens Pose 0.6b
Sapiens is a family of vision Transformer models pre-trained on 300 million high-resolution human images, focusing on human-centric vision tasks.
Pose Estimation English
S
facebook
19
2
Sapiens Depth 0.3b Bfloat16
Sapiens is a series of vision transformer models pre-trained on 300 million human images at 1024x1024 resolution, focusing on human-centric vision tasks.
3D Vision English
S
facebook
22
0
Sapiens Seg 0.6b Bfloat16
Sapiens is a family of Vision Transformer models pre-trained on 300 million 1024x1024 resolution human images, focusing on human-centric vision tasks.
Image Segmentation English
S
facebook
24
0
Sapiens Pose 1b Bfloat16
Sapiens is a vision transformer series model pre-trained on 300 million 1024x1024 resolution human images, focusing on human-centric vision tasks.
Pose Estimation English
S
facebook
31
0
Sapiens Pretrain 1b Bfloat16
Sapiens is a vision Transformer model pre-trained on 300 million 1024×1024 resolution human images, supporting high-resolution inference and real-world scenario generalization.
Image Classification English
S
facebook
23
0
Sapiens Pretrain 2b Bfloat16
Sapiens is a family of Vision Transformer models pre-trained on 300 million 1024x1024 resolution human images, supporting high-resolution inference and real-world scenario generalization.
Image Classification English
S
facebook
20
2
Sapiens Depth 2b
Sapiens is a family of vision Transformer models pre-trained on 300 million 1024×1024 resolution human images, focusing on human-centric vision tasks.
3D Vision English
S
facebook
40
3
Sapiens Seg 0.3b
Sapiens is a family of Vision Transformer models pre-trained on 300 million 1024×1024 resolution human images, focusing on human-centric vision tasks.
Image Segmentation English
S
facebook
48
2
Sapiens Pose 1b
Pose-Sapiens-1B is a high-resolution human pose estimation model based on the Vision Transformer architecture, pre-trained on 300 million 1024x1024 resolution human images, supporting 308 keypoint detections (body, face, hands, and feet).
Pose Estimation English
S
facebook
82
4
Sapiens Pretrain 0.3b
Sapiens is a vision Transformer model pretrained on 300 million high-resolution human images, specifically designed for human-centric vision tasks.
Image Classification English
S
facebook
34
1
Sapiens Pretrain 0.6b
Sapiens is a Vision Transformer model pre-trained on 300 million 1024×1024 resolution human images, excelling in human-centric vision tasks.
Image Classification English
S
facebook
13
0
Sapiens Pretrain 1b
Sapiens is a vision Transformer model pretrained on 300 million high-resolution human images, focusing on human-centric vision tasks.
Face-related English
S
facebook
48
1
Sapiens Pretrain 2b
Sapiens-2B is a Vision Transformer model pre-trained on 300 million high-resolution human images, specifically designed for human-centric vision tasks with exceptional generalization capabilities.
Face-related English
S
facebook
28
2
Sapiens Depth 0.6b Torchscript
Sapiens is a vision transformer series model pre-trained on 300 million 1024 x 1024 resolution human images, focusing on human-centric vision tasks.
3D Vision English
S
facebook
34
0
Sapiens Seg 1b Torchscript
Sapiens is a series of vision transformers pre-trained on 300 million 1024×1024 resolution human images, specifically designed for human-centric vision tasks with exceptional generalization capabilities.
Image Segmentation English
S
facebook
892
1
Sapiens Pose 1b Torchscript
Sapiens is a vision Transformer model pre-trained on 300 million 1024x1024 resolution human images, specifically designed for high-precision pose estimation tasks.
Pose Estimation English
S
facebook
1,245
7
Sapiens Pretrain 1b Torchscript
Sapiens is a family of vision Transformers pre-trained on 300 million 1024x1024 resolution human images, specifically designed for human-centric vision tasks.
Image Classification English
S
facebook
35
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase